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The purpose of determining the fundamental matrix (F) is to define the 
epipolar geometry and to relate two 2D images of the same scene or video 
series to find the 3D scenes. The problem we address in this work is the 
estimation of the localization error and the processing time. We start by 
comparing the following feature extraction techniques: Harris, features from 


accelerated segment test (FAST), scale invariant feature transform (SIFT) and 
speed-up robust features (SURF) with respect to the number of detected points 
and correct matches by different changes in images. Then, we merged the best 
chosen by the objective function, which groups the descriptors by different 
regions in order to calculate F. Then, we applied the standardized eight-point 
algorithm which also automatically eliminates the outliers to find the optimal 
solution F. The test of our optimization approach is applied on the real images 
with different scene variations. Our simulation results provided good results 
in terms of accuracy and the computation time of F does not exceed 900 ms, 
as well as the projection error of maximum 1 pixel, regardless of the 
modification. 
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1. INTRODUCTION 

The relationship that links two or more images with different changes is called epipolar geometry [1]. 
The relation that links the scenes with different modifications named fundamental matrix. The determination 
of F is based on the calibrations of the intrinsic and extrinsic parameter of the camera, among these application 
of the latter and the processing in real time [2]—[5]. The determination of F is based on the remarkable points 
detected in the images. The detection of these points is done by two techniques which are manual and 
automatic. The manual technique of selection of the points gives many errors when we have several images or 
videos. On the other hand, the automatic technique performs better than the manual technique. 

There are several automatic image feature extraction methods, for example, Harris [6], features from 
accelerated segment test (FAST) [7], scale invariant feature transform (SIFT) [8], and speed-up robust features 
(SURF) [9]. In this work, we use automatic techniques to extract key points from images with different 
variations. In the literature, we find that both techniques (FAST and Harris) give good results in terms of 
computation time but are sensitive to orientation and lighting variations. On the other hand, both algorithms 
(SIFT and SURF) are acceptable regardless of the scene modifications [10]. Then the step of determining the 
correspondence of key points by extraction techniques, the calculation of F by standard robust methods with 
eight points in [11]. The points received a response for the calculation of F [12]. This technique is tricky for 
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the determination of the F matrix [13]. Among the techniques of the evaluation of the extracted points we find 
in the literature (LMedS) [14], random sample consensus (RANSAC) [15] and m-estimator [16]. In this paper 
we present a technique for determining the relationship between two images with different modifications in 
order to optimize the computation time and the projection error. The results of the simulations obtained is better 
can be applied to real time processing. 


2. RESEARCH METHOD 
2.1. Feature extraction techniques 

The totality of the F estimation techniques require a number of input match points [17]. In the 
literature, there are two categories for extracting the descriptors, one based on the gray level intensity function 
and the other based on the binary descriptors (LBP, TP-LBP and FP-LBP) [18]—[21]. In this article, we have 
extracted the characteristics by the techniques previously cited in the introduction. 

Then we compare them in terms of the number of points extracted and the similarity and processing 
time by different variations of the scene. Mikolajczyk and Schmid in [22], he compared Harris [6], FAST [7], 
and SIFT [10] and other detection techniques and noticed that SIFT has provided very good returns in terms 
of rotation and change of views as well as scale. Then [11], the SIFT descriptor which allows to detect a larger 
number of descriptors than the SURF method [21], but the latter is more efficient, hat SIFT has provided very 
good returns in terms of processing. 


2.2. Detector SURF (speed-up robust features) 

Detector SURF [21] proposes a new method for local description of points of interest, called SURF 
and it is strongly influenced by the SIFT approach where it couples a recording step of the analysis area with 
the construction of a histogram of the oriented gradients. The calculation process consists of two steps which 
are the extraction of the points of interest and the description of the points of interest then each step contains 
three sub-steps. i) The extraction of points of interest: create an approximation of the Hessian matrix: is done 
by the product of the convolution of the integral image by the Gaussian filter with the scale factor increasing. 
Calculate the responses of the kernels used: is done by subtracting between two neighboring images belonging 
to the same octave in order to determine the stable points. Finding maxima at scale and in space: this step is 
based on the Taylor development of the Gaussian difference function (DoG) in order to determine the position 
and scale of the detected points. ii) Description of the points of interest: determine the size of the study 
descriptor window in circle form which is characterized by the circle radius and scale. Get the dominant 
orientation: The calculation process consists of determining the rotation (or registration) angle to be applied to 
the local description window. To do this, the authors apply Haar wavelets to the integral image, which 
considerably reduces the computation time. These wavelets make it possible to calculate the first derivative of 
the image on a square neighborhood and thus to study the distribution of horizontal and vertical gradients. 
Then, the responses of the wavelets are used to plot the distribution of the gradients and to deduce the angle of 
alignment on the initial image Extract the SURF descriptor: at each position, the x and y responses within the 
segment are summed and used to form a new vector with a size of 64. The different steps of the algorithm are 
summarized as shown in the Figure 1. 


Create an integral image 


The extraction of points of interest 
Create an approximation of Calculate the responses of the cores Find maxima in scale and 
the matrix hessian used. space. 


y 


Description of points of interest 


Determine the size of the Getting the dominant orientation Extraire le descripteur SURF 
descriptor. 


Figure 1. The stages of the creation of descriptor vectors by the SURF technique 


2.3. Calculus fundamental matrix F 
The matrix F is calculated from points extracted between two or more images with different variations 
of the scene. the determination of the selected points between the images is based on the Euclidean distance 
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metric. The different points chosen represent straight lines which are called epipolar line, equation (1) 
represents the points selected by different variation of the scene. 


m, 7 Fm; = 0 d) 


This relation connects by the extracted points between two images of the same scene m; = (xi, Yi W;) and 
m'i = (Xi, Yi W; ) of the corrdonnes in 3D and F the fundamental matrix which connects the two images with 
different changes. 


2.4. Statistical technique of calibration and evaluation 

There are several techniques for determining the F, which is based on the matches in the matrix. Ces 
dernières se divisent en deux catégories: les classiques linaire et non linéaires [14]. The work revealed that the 
linear technical methods according to [22], are at the limit of noise tolerance variation. Then the nonlinear 
methods (M-estimators, RANSAC, and LMeds) which are robust to noise according to [23]. The M-estimators 
technique is able to provide an excellent classification for noisy images, can also take into account the following 
aberrations and techniques (RANSAC and LMeds) unlike the M-estimators which are sensitive to noisy images 
and do not retain the aberrations when computing F. The (2) represents the link between the points detected by 
different variations of the scene. 


Af=0 2) 


With f: the elements of the F in the form of the chosen descriptors: 


XiX XXa o Xaa N 
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When we develop (2) we find nine equations and each equation presents the relationship between two images 
of different variations. The calculation of F is based on the best choices of the detected points.’ but the 
disadvantage of this technique is when the detected points are incorrectly located [24], [25]. If the rank of F is 
different from two, the solution of the optimization (3) is performed by the least squares technique. 


min ¥,(m;7 Fm) (3) 


The solution of (3) is done by the singular value decomposition (SVD) technique. This technique was 
developed by Hartley according to [26]—[28] to make the method even more efficient. This way of proceeding 
allowed a notable improvement of the performances of the eight points method. 


2.5. Robust methods 

The iterative statistical techniques give good results when the data is noisy among these techniques 
are [15], [29], [30], LMeds [14], and M-estimators [27]. The ransac technique is based on the setting of 
thresholds to take the right descriptors for the estimation of F [25], [31]. The choice of the points by the 
(LMeds) technique is based on the distance between the point and the epipolar lines in order to determine the 
best F, the last technique of the M-estimator is integrated with the two previous techniques, it allows to divide 
the studied points into two groups (inliers and quasi-inliers). Equation (4) presents the problem to be addressed. 


ming Di wir” (4) 
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w;: Is the optimization function. r; = fa2XiXi + foryixi + faaWXi + forxiyi + fooyiyi + fogwyi + figxiw' + 
fo3yiw' + fzzww (5) is referred to by [13], [25]. w;: 


1 r < gio 
0 dio <r<o 


w; = wi(pi pi) = = T <T < po (5) 
i 
0 go <r 
; : f f dian(rj 
¢: separator between two outlier and quasi-outlier regions and ø can be expressed as o = ce the scale 
of the error and @, take the random values between (0, 1); 
T ^2 
-5n y. Oni Fmi) 
Esampson i Xi Wi (Fmi)21+(Fm;)2,+(FTm;)21+(FTm;)2? (6) 


where (F mi)”; J=1, 2 (Fm;). In the reference article [12], we find that the statistical method (LMeds) 


outperforms the RANSAC method in terms of the estimation of F. The M-estimator method is robust when the 
images are noisy. 


2.6. The proposed technique 

First, we load two images of the same scene by different variation, then we apply the following 
algorithms (Harris, FAST, SIFT, and SURF) and after the comparison between that, we find SURF the most 
robust by different variation. Secondly, we take the descriptor of the latter and normalize by all and then choose 
the eight random points to find the dimension matrix (8x9) and after decomposed by the SVD method to find 
the matrix of 3x3 of property following of equal rank 2, determines and zero. Finally, we apply the function of 
optimization to find the optimal solution (F) under the iterative algorithm. The different steps are represented 
in the Figure 2. 
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Figure 2. Proposed algorithm 
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3. RESULTS AND DISCUSSION 

In this part, we study the four key point detectors (Harris, FAST, SIFT and SURF) and then we 
associate the RANSAC technique with seven normalized points by varying the brightness, rotation and the 
moving object. Then F is determined by two tests without limit the points extracted and with limit. The results 
of our simulations are shown in the Figure 3, Figure 4, and Figure 5. We illustrate the first test without limitation 
of the points extracted by different detector with different modifications. Then the second test with limitation 
of the points in order to optimize the calculation time and the projection error the Figure 6 represents that. 


Figure 6. The secondary test with by changing the point of view: (a) moving object, (b) lighting, (c), rotation, 
and (d) uniqueness threshold 
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After limiting the number of points detected by (SURF) to have the best eight points to calculate (F), 
the Figures 3-6 show that. The different results of the simulation are grouped in the Table 1. It represents four 
algorithms of the extraction of the characteristics of the images and our technique. The results of our 
simulations show in the table above that the SIFT detector is better than the others. Then a monk projection 
error that we show in the result of the simulation in the following as shown in Figure 7. Figure 8 represents the 
comparison of our proposal and other techniques based on scene variation and feature extraction technique to 
optimize processing time. The results of our simulation show that our technique is better in terms of processing 
time with different modification of the scene does not exceed 900 ms. 


Table 1. Comparison of the performance of the four algorithms of feature extraction and similar points, 
applied on the basis of images with different variation of the scene 


Without modification Limitation of points extracted by surf 
The detectors Harris FAST SIFT SURF In motion 
In motion Inmotion Inmotion In motion Lighting 
Lighting Lighting Lighting Lighting Rotation 
Rotation Rotation Rotation Rotation 
Kypnt1 302 449 54 423 82 
213 112 24 152 18 
264 76 16 111 43 
Kypnt2 328 512 480 439 63 
271 110 142 126 21 
325 91 142 88 35 
Inliers 40 54 153 150 8 
36 24 105 73 8 
32 16 86 46 8 
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Figure 8. Estimation of the processing time according to the different changes of the scene and the detector 
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4. CONCLUSION 

In this paper, we deal with the comparison between our approach and that of the literature according 
to the following parameters: the detected points (descriptor), the correct match, the computation time of the F 
estimate and the Sampson projection errors. This work is divided into two techniques. The first technique 
consists in merging, between the different techniques of feature extraction which are (SIFT, FAST and Harris), 
with the standardized eight-point algorithm of RANSAC-. It obtained the number of student descriptors and the 
number of correspondents low compared to the number of extracted points. The major disadvantage of this 
technique, which eliminates outliers and student computing time, is that it does not apply to real-time 
applications. The second technique (our technique) based on the fusion between the detector (SURF) with the 
modification of the uniqueness threshold and the standardized eight-point M-estimation algorithm to obtain 
the fundamental matrix by different levels of the objective function. 
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